So welcome everybody. Welcome back to our class in deep learning and today we want to look at
recurrent neural networks and how to process sequences. So yeah we'll have a look shortly
into the motivation why we need a different way of modeling for sequences when this is kind of
special. And then we look into some simple approaches, the simple recurrent
neural networks, and then some ideas how you can deal with longer sequences if
you want to introduce memory and things like that. We have two different
approaches to tackle this. Then we want to compare them and in the end look a
bit into other purposes of recurrent neural networks because they can also be
used for sampling. And with that you can then also do sequence generation. Okay,
so yeah, motivation. So far we had one input, could be multi-dimensional, one
input vector, but just a single type of input and then we were interested in one
single type of output. So we had this feed-forward processing chain where you
have input processing and result and that was quite useful in a broad variety of
applications. Now of course there's also many different kinds of inputs in
particular time dependent signals and if you think about music or speech or
videos or other sensory data they can be time dependent and one big problem that
you have with time dependent signals is that they often don't have the same
length. So if I speak fast, you're just just sitting there nodding, yeah, it's early in the morning.
Okay, just joking. Okay, then you missed all the nice things that I had to say about
Zip Hochreiter and Jürgen Schmidhuber. Good, so this is the actual setup and we see that we now
have still this concept of this state here and if you look at this, so remember this figure is also
something you want to prepare for the for the oral exam. You want to be able to draw this figure and
explain it, what's actually happening here. So you see that there is two memories now, there is some
H and some C and they are also time-dependent so they change over time. The first thing you notice,
H runs all the way here, then there is a non-linearity, then it somehow gets influenced
what comes from the cell state, but we have this long line here, a non-linearity and that essentially
produces the new state. So this is very similar to what we already know from the L-mon cell. We have
a state, some inputs that are associated with it and that produces in a non-linear way the new
state. So here we essentially see the L-mon cell and here this state then also produces the observation.
So this is not connected, this is just running through here, non-linearity. So you could say
this branch here and this output here is exactly the L-mon state, L-mon cell. So not that much new.
Well, there is quite a bit new here because we need all these additional symbols here and they are
associated to the cell state and the cell state is interesting because it gets multiplied with
something, it gets added to something, but there is no non-linearity here. This is completely linear
memory. Everything that is happening in the cell state is linear, either multiplication or addition.
And now let's see what's happening. There is essentially two things happening. Yeah, there
is a multiplication here and a addition here. And the first thing that is happening we have the input
and the current state and then there is a non-linearity. This is a sigmoid function,
meaning it produces values between 0 and 1.
Now if you multiply something with 1, what happens?
Not so exciting, nothing.
If you multiply something with 0, it's gone.
And this is a vector, C is a vector,
and now we have element-wise multiplication with 0's or 1.
This is why this guy is called F. This is a forget gate.
So this is used to kick out information out of the cell memory.
So we can delete specific entries.
We reset them so we can forget.
But if we can only forget, it's maybe not so useful.
So we have to also pick up something.
Presenters
Zugänglich über
Offener Zugang
Dauer
00:31:30 Min
Aufnahmedatum
2019-12-10
Hochgeladen am
2019-12-10 14:09:02
Sprache
en-US